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1. Scope* 


1.1 This practice covers statistical methodology for assess- 
ing the expected agreement between two different standard test 
methods that purport to measure the same property of a 
material, and for the purpose of deciding if a simple linear bias 
correction can further improve the expected agreement. It is 
intended for use with results obtained from interlaboratory 
studies meeting the requirement of Practice D6300 or equiva- 
lent (for example, ISO 4259). The interlaboratory studies shall 
be conducted on at least ten materials in common that among 
them span the intersecting scopes of the test methods, and 
results shall be obtained from at least six laboratories using 
each method. Requirements in this practice shall be met in 
order for the assessment to be considered suitable for publica- 
tion in either method, if such publication includes claim to 
have been carried out in compliance with this practice. Any 
such publication shall include mandatory information regard- 
ing certain details of the assessment outcome as specified in the 
Report section of this practice. 


1.2 The statistical methodology is based on the premise that 
a bias correction will not be needed. In the absence of strong 
statistical evidence that a bias correction would result in better 
agreement between the two methods, a bias correction is not 
made. If a bias correction is required, then the parsimony 
principle is followed whereby a simple correction is to be 
favored over a more complex one. 


Note 1—Failure to adhere to the parsimony principle generally results 
in models that are over-fitted and do not perform well in practice. 


1.3 The bias corrections of this practice are limited to a 
constant correction, proportional correction, or a linear (pro- 
portional + constant) correction. 
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1.4 The bias-correction methods of this practice are method 
symmetric, in the sense that equivalent corrections are obtained 
regardless of which method is bias-corrected to match the 
other. 


1.5 A methodology is presented for establishing the numeri- 
cal limit (designated by this practice as the between methods 
reproducibility) that would be exceeded about 5 % of the time 
(one case in 20 in the long run) for the difference between two 
results where each result is obtained by a different operator 
using different apparatus and each applying one of the two 
methods X and Y on identical material, where one of the 
methods has been appropriately bias-corrected in accordance 
with this practice, in the normal and correct operation of both 
test methods. 


Note 2—In earlier versions of this standard practice, the term “cross- 
method reproducibility” was used in place of the term “between methods 
reproducibility.” The change was made because the “between methods 
reproducibility” term is more intuitive and less confusing. It is important 
to note that these two terms are synonymous and interchangeable with one 
another, especially in cases where the *cross-method reproducibility" term 
was subsequently referenced by name in methods where a D6708 
assessment was performed, before the change in terminology in this 
standard practice was adopted. 

Note 3—Users are cautioned against applying the between methods 
reproducibility as calculated from this practice to materials that are 
significantly different in composition from those actually studied, as the 
ability of this practice to detect and address sample-specific biases (see 
6.7) is dependent on the materials selected for the interlaboratory study. 
When sample-specific biases are present, the types and ranges of samples 
may need to be expanded significantly from the minimum of ten as 
specified in this practice in order to obtain a more comprehensive and 
reliable between methods reproducibility that adequately cover the range 
of sample-specific biases for different types of materials. 


1.6 This practice is intended for test methods which mea- 
sure quantitative (numerical) properties of petroleum or petro- 
leum products. 


1.7 The statistical calculations of this practice are also 
applicable for assessing the expected agreement between two 
different test methods that purport to measure the same 
property of a material using results that are not as described in 
1.1, provided the results are obtained on the same comparison 
sample set, the standard error associated with each test result is 


known, and the sample set design meets the requirements of 
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this practice, in particular that the statistical degree of freedom 
associated with all standard errors are at least 30. Requirements 
in this practice shall be met in order for the assessment to be 
considered suitable for publication in either method, if such 
publication includes claim to have been carried out in compli- 
ance with this practice. Any such publication shall include 
mandatory information regarding certain details of the assess- 
ment as specified in the Report section of this practice. 


1.8 The methodology in this practice can also be used to 
perform linear regression analysis between two variables (X, 
Y) where there is known uncertainty in both variables that may 
or may not be constant over the regression range. The common 
acronym used to describe this type of linear regression is 
ReXY (Regression with errors in X and Y). The ReXY 
technique for assessing the correlation between two variables 
as described in this practice can be used for investigative 
applications where the strict data input requirement may not be 
met, but the outcome can still be useful for the intended 
application. Use of this practice for ReXY should be conducted 
under the tutelage of subject matter experts familiar with the 
statistical theory and techniques described in this practice, the 
methodologies associated with the production and collection of 
the results to be used for the regression analysis, and interpre- 
tation of assessment outcome relative to the intended applica- 
tion. 


1.9 This international standard was developed in accor- 
dance with internationally recognized principles on standard- 
ization established in the Decision on Principles for the 
Development of International Standards, Guides and Recom- 
mendations issued by the World Trade Organization Technical 
Barriers to Trade (TBT) Committee. 


2. Referenced Documents 


2.1 ASTM Standards:? 

D5580 Test Method for Determination of Benzene, Toluene, 
Ethylbenzene, p/m-Xylene, o-Xylene, C, and Heavier 
Aromatics, and Total Aromatics in Finished Gasoline by 
Gas Chromatography 

D5769 Test Method for Determination of Benzene, Toluene, 
and Total Aromatics in Finished Gasolines by Gas 
Chromatography/Mass Spectrometry 

D6299 Practice for Applying Statistical Quality Assurance 
and Control Charting Techniques to Evaluate Analytical 
Measurement System Performance 

D6300 Practice for Determination of Precision and Bias 
Data for Use in Test Methods for Petroleum Products and 
Lubricants 

D7372 Guide for Analysis and Interpretation of Proficiency 
Test Program Results 

2.2 ISO Standard:* 

ISO 4259 Petroleum Products—Determination and Applica- 
tion of Precision Data in Relation to Methods of Test 


? For referenced ASTM standards, visit the ASTM website, www.astm.org, or 
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM 
Standards volume information, refer to the standard’s Document Summary page on 
the ASTM website. 

? Available from American National Standards Institute (ANSI), 25 W. 43rd St., 
4th Floor, New York, NY 10036. 


3. Terminology 


3.1 Definitions: 

3.1.1 between ILCP method-averages reproducibility 
(Rce s nep OO quantitative expression of the random error 
associated with the difference between the bias-corrected ILCP 
average of method X versus the ILCP average of method Y 
from a Proficiency Testing program, when the method X has 
been assessed versus method Y, and an appropriate bias- 
correction has been applied to all method X results in accor- 
dance with this practice; it is defined as the numerical limit for 
the difference between two such averages that would be 
exceeded about 5 % of the time (one case in 20 in the long run). 


3.1.2 between-method bias, n—a quantitative expression for 
the mathematical correction that can statistically improve the 
degree of agreement between the expected values of two test 
methods which purport to measure the same property. 


3.1.3 between methods reproducibility (Ryy), n—a quantita- 
tive expression of the random error associated with the 
difference between two results obtained by different operators 
using different apparatus and applying the two methods X and 
Y, respectively, each obtaining a single result on an identical 
test sample, when the methods have been assessed and an 
appropriate bias-correction has been applied in accordance 
with this practice; it is defined as the numerical limit for the 
difference between two such single and independent results 
that would be exceeded about 5 96 of the time (one case in 20 
in the long run) in the normal and correct operation of both test 
methods. 

3.1.3.1 Discussion—A statement of between methods repro- 
ducibility shall include a description of any bias correction 
used in accordance with this practice. 

3.1.3.2 Discussion—Between methods reproducibility is a 
meaningful concept only if there are no statistically observable 
sample-specific relative biases between the two methods, or if 
such biases vary from one sample to another in such a way that 
they may be considered random effects. (See 6.7.) 


3.1.4 centered sum of squares (CSS), n—a statistic used to 
quantify the degree of agreement between the results from two 
test methods after bias-correction using the methodology of 
this practice. 


3.1.5 Interlaboratory Crosscheck Program  (ILCP), 
n—ASTM International Proficiency Test Program sponsored 
by Committee D02 on Petroleum Products, Liquid Fuels, and 
Lubricants; see ASTM website for current details. D7372 


3.1.6 total sum of squares (TSS), n—a statistic used to 
quantify the information content from the inter-laboratory 
study in terms of total variation of sample means relative to the 
standard error of each sample mean. 


3.2 Symbols: 
XY = single X-method and Y-method results, 
respectively 
Xj Yijk = single results from the X-method and 
Y-method round robins, respectively 
X, Y; = means of results on the i” round robin 
sample 


Ry Ry 


Ry; Ry; 


Rirce_ X, ILCP_Y 
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= the number of samples in the round robin 
= the numbers of laboratories that returned 


results on the i” round robin sample 

the reproducibilities of the X- and Y- 
methods, respectively 

the reproducibility of method X and Y, 
evaluated at the method X and Y means 
of the i™ round robin sample, respectively 


= estimate of between ILCP method- 


averages reproducibility 

the reproducibility standard deviations, 
evaluated at the method X and Y means 
of the i '" round robin sample 

the repeatability standard deviations, 
evaluated at the method X and Y means 
of the i'" round robin sample 
standard errors of the means i' 
robin sample 

the weighted means of round robins 
(across samples) 

deviations of the means of the i™ round 
robin sample results from X and Y, re- 
spectively. 

total sums of squares, around X and Y 


^ round 


= a ratio for comparing variances; not 


unique—more than one use 

the degrees of freedom for reproducibility 
variances from the round robins 

weight associated with the difference be- 
tween mean results (or corrected mean 
results) from the i” round robin sample 
centered sum of squares, weighted sum of 
squared differences between (possibly 
corrected) mean results from the round 
robin 

parameters of a linear correction: ? = a + 
bX 

ratios for assessing reductions in sums of 
squares 

estimate of between methods reproduc- 
ibility 

predicted Y-method value for a sample by 
applying the bias correction established 
from this practice to an actual X-method 
result for the same sample 

predicted i round robin sample 
Y-method mean, by applying the bias 
correction established from this practice 
to its corresponding X-method mean 
standardized difference between Y, and f;. 


= harmonic mean numbers of laboratories 


submitting results on round robin 
samples, by X- and Y- methods, respec- 


tively 
estimate of | between methods 
reproducibility, computed from an 


X-method result only 


4. Summary of Practice 


4.] Precisions of the two methods are quantified using 
inter-laboratory studies meeting the requirements of Practice 
D6300 or equivalent, using at least ten samples in common that 
span the intersecting scopes of the methods. The arithmetic 
means of the results for each common sample obtained by each 
method are calculated. Estimates of the standard errors of these 
means are computed. 


Note 4—For established standard test methods, new precision studies 
generally will be required in order to meet the common sample require- 
ment. 

Note 5—Both test methods do not need to be run by the same 
laboratory. If they are, care should be taken to ensure the independent test 
result requirement of Practice D6300 is met (for example, by double-blind 
testing of samples in random order). 

4.2 Weighted sums of squares are computed for the total 
variation of the mean results across all common samples for 
each method. These sums of squares are assessed against the 
standard errors of the mean results for each method to ensure 
that the samples are sufficiently varied before continuing with 
the practice. 


4.3 The closeness of agreement of the mean results by each 
method is evaluated using appropriate weighted sums of 
squared differences. Such sums of squares are computed from 
the data first with no bias correction, then with a constant bias 
correction, then, when appropriate, with a proportional 
correction, and finally with a linear (proportional + constant) 
correction. 


4.4 'The weighted sums of squared differences for the linear 
correction is assessed against the total variation in the mean 
results for both methods to ensure that there is sufficient 
correlation between the two methods. 


4.5 The most parsimonious bias correction is selected. 


4.6 The weighted sum of squares of differences, after 
applying the selected bias correction, is assessed to determine 
whether additional unexplained sources of variation remain in 
the residual (that is, the individual Y; minus bias-corrected X;) 
data. Any remaining, unexplained variation is attributed to 
sample-specific biases (also known as method-material 
interactions, or matrix effects). In the absence of sample- 
specific biases, the between methods reproducibility is esti- 
mated. 


4.7 If sample-specific biases are present, the residuals (that 
is, the individual Y; minus bias-corrected X;) are tested for 
randomness. If they are found to be consistent with a random- 
effects model, then their contribution to the between methods 
reproducibility is estimated, and accumulated into an all- 


encompassing between methods reproducibility estimate. 

4.8 Refer to Fig. 1 for a simplified flow diagram of the 
process described in this practice. 
5. Significance and Use 


5.1 This practice can be used to determine if a constant, 
proportional, or linear bias correction can improve the degree 
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FIG. 1 Simplified Flow Diagram for this Practice 


of agreement between two methods that purport to measure the 
same property of a material. 


5.2 The bias correction developed in this practice can be 
applied to a single result (X) obtained from one test method 
(method X) to obtain a predicted result (f) for the other test 
method (method Y). 


Note 6—Users are cautioned to ensure that Ê is within the scope of 
method Y before its use. 


5.3 The between methods reproducibility established by this 
practice can be used to construct an interval around Ŷ that 
would contain the result of test method Y, if it were conducted, 
with approximately 95 % probability. 


5.4 This practice can be used to guide commercial agree- 
ments and product disposition decisions involving test methods 
that have been evaluated relative to each other in accordance 
with this practice. 


5.5 The magnitude of a statistically detectable bias is 
directly related to the uncertainties of the statistics from the 
experimental study. These uncertainties are related to both the 
size of the data set and the precision of the processes being 
studied. A large data set, or, highly precise test method(s), or 
both, can reduce the uncertainties of experimental statistics to 
the point where the "statistically detectable" bias can become 
"trivially small," or be considered of no practical consequence 
in the intended use of the test method under study. Therefore, 


uil D6708 - 19a 


users of this practice are advised to determine in advance as to 
the magnitude of bias correction below which they would 
consider it to be unnecessary, or, of no practical concern for the 
intended application prior to execution of this practice. 


Norte 7—It should be noted that the determination of this minimum bias 
of no practical concern is not a statistical decision, but rather, a subjective 
decision that is directly dependent on the application requirements of the 
users. 


6. Procedure 


Note 8— For an in-depth statistical discussion of the methodology used 
in this section, see Appendix X1. For a worked example, see Appendix 
X2. 


6.1 Calculate sample means and standard errors from Prac- 
tice D6300 results. 

6.1.1 The process of applying Practice D6300 to the data 
may involve elimination of some results as outliers, and it may 
also involve applying a transformation to the data. For this 
practice, compute the mean results from data that have not 
been transformed, but with outliers removed in accordance 
with Practice D6300. The precision estimates from Practice 
D6300 are used to estimate the standard errors of these means. 

6.1.2 Compute the means as follows: 

6.1.2.1 Let X; represent the K^ result on the i” common 
material by the j^ lab in the round robin for method X. 
Similarly for Y;;. (The i” material is the same for both round 
robins, but the j” lab in one round robin is not necessarily the 
same lab as the j” lab in the other round robin.) Let ny;; be the 
number of results on the i" material from the j^ X-method lab, 
after removing outliers, that is, the number of results in cell 
(ij). Let Ly; be the number of laboratories in the X-method 
round robin that have at least one result on the i” material 
remaining in the data set, after removal of outliers. Let S be the 
total number of materials common to both round robins. 


6.1.2.2 The mean X-method result for the i” material is: 


L; 


X= (1) 
xi j yi 


Xi, 
where, X; is the average of the cell averages on the i” mate- 
rial by method X. 


6.1.2.3 Similarly, the mean Y-method result for the j^ 


material is: 


1 ia, 
=a 2 (2) 


6.1.3 The standard errors (standard deviations of the means 
of the results) are computed as follows: 

6.1.3.1 If spy; is the estimated reproducibility standard 
deviation from the X-method round robin, and s,y; is the 
estimated repeatibility standard deviation, then an estimate of 
the standard error for X; is given by: 


1 1 1 
Sxi = REN Sx; zt 1 LÀ >) | (3) 
i ij ij 


Norte 9—Since repeatability and reproducibility may vary with X, even 
if the Ly; were the same for all materials and the ny;; were the same for all 
laboratories and all materials, the (5,;) might still differ from one material 
to the next. 


6.1.3.2 sy, the estimated standard error for Y;, is given by an 
analogous formula. 


6.2 Calculate the total variation sum of squares for each 
method, and determine whether the samples can be distin- 
guished from each other by both methods. 

6.2.1 The total sums of squares (TSS) are given by: 


x,-x)\’ y-Y) 
is - (Eck) marss, - X Em | (4) 


i 


where: 


AER = UG 
= 3 and Y — r (5) 
2 ES 
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are weighted averages of all X; s and Y; s respectively. 


6.2.2 Compare F = TSSy/(S-1) to the 95" percentile of 
Fisher's F distribution with (S-1) and v, degrees of freedom for 
the numerator and denominator, respectively, where vy is the 
degrees of freedom for the reproducibility variance (Practice 
D6300, paragraph 8.3.3.3) for the X-method round robin. If F 
does not exceed the 95" percentile, then the X-method is not 
sufficiently precise to distinguish among the S samples. Do not 
proceed with this practice, as meaningful results cannot be 
produced. 

6.2.3 In a similar manner, compare F = TSS,/(S-1) to the 
95" percentile of Fisher's F distribution, using the degrees of 
freedom of the reproducibility variance of the Y-method, vy, in 
place of vy. Similarly, do not proceed with this practice if F 
does not exceed the 95" percentile. 


Note 10—If one or both of the conditions of 6.2.2 and 6.2.3 are 
satisfied only marginally, it is unlikely that this practice will produce a 
meaningful outcome. The test in the next subsection will almost certainly 
fail. 

6.3 Test whether the methods are sufficiently correlated. 

6.3.1 Using the weights (w;) as computed in 6.4.1.1, Eq 6, 
calculate the weighted correlation coefficient r: 


pum Yw(x, 7 x)(v, 7 y) 
VXw(x, = x)'Xwv, - 2 


where X and Y are Xw,X,/ dw, and Yw,Y,/ Xw,, respectively. 


(6) 


6.3.2 Use r to calculate the F-statistic: 


S — 2)r? 
pan 37 (7) 


6.3.3 Compare F to the 99" percentile of Fishers F 
distribution with 1 and S-2 degrees of freedom in the numerator 
and denominator, respectively. 

6.3.3.1 If F is less than the 99™ percentile value, then this 
practice concludes that the methods are too discordant to 
permit use of the results from one method to predict those of 
the other. 

6.3.3.2 If F is greater than the tabled value, proceed to 6.5. 


6.4 Calculate the centered sum of squares (CSS) statistic for 
each of the following classes of bias-correction methodology. 


Note 11—The revised algorithms presented in this version of D6708 
were developed in order to correct very rare cases in which the algorithms 
of previous versions do not converge to the optimal linear models. The 
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rare cases generally involved data sets with poor correlations between the 
two methods. In the vast majority of data sets, including worked example 
of this practice, the old and the new algorithms converge to exactly the 
same optimal models. Continuing to use the old algorithms is a reasonable 
option provided the user verifies that the computed value of CSS1b is 
never larger than CSSO, and that the computed value of CSS2 is never 
larger than either CSSla or CSSIb. If the aforementioned situation is 
detected using the old algorithms, then the outcome from this version is 
deemed to be the correct outcome. 


6.4.1 Class 0—No bias correction. 
6.4.1.1 Compute the weights (w;) for each sample i: 


1 
w,= (8) 


6.4.1.2 Compute CSS: 
CSS, = X, w(X, - Y, (9) 


i 


6.4.2 Class la—Constant bias correction. 
6.4.2.1 Using the weights (w;) from 6.4.1.1, compute the 
constant bias correction (a): 


_ by w,(Y; = Xj) yw, vx, 
d 2 w, Sw; Sw, 


6.4.2.2 Compute CSS: 
CSS, = Dw (¥, — (X, a)? (11) 


(10) 


6.4.3 Class 1b—Proportional bias correction. 

6.4.3.1 The computations of this subsection (6.4.3) are 
appropriate only if both of the following conditions apply: (/) 
the measured property assumes only non-negative values, and 
(2) a property value of zero has a physical significance (for 
example, concentrations of specific constituents). In addition, it 
is not mandatory but highly recommended that max(Y,)>2 
min(Y;). 

6.4.3.2 The computations involve iterative calculation of the 
weights {w,} and the proportional correction b. 

6.4.3.3 Set b = 1. 

6.4.3.4 Compute the weight w; for each sample i: 


1 


c= 22 
satb sy 


(12) 


Ww 


6.4.3.5 Calculate the following three sums: 


A= py ee (13) 
B = Dw2(X? 52, — Y? s2,) (14) 
c= -9 w?XY,s?, (15) 


6.4.3.6 Calculate bo: 


(16) 


6.4.3.7 If Ib — bol > .001 b, replace b with bọ and go back to 
6.4.3.4. Otherwise, the iteration can be stopped, as further 
iteration will not produce meaningful improvement. Replace b 
with by and go on to 6.4.3.8. 

6.4.3.8 Calculate the final weights {w;} as in 6.4.3.4. 

6.4.3.9 Calculate CSS: 


CSS,, = X w,(Y, — bx)? (17) 


6.4.4 Class 2—Linear (proportional + constant) bias correc- 
tion. 


6.4.4.1 This involves iterative calculation of the weights 
{w;}, the weighted means of X; s and Y;'s, and the proportional 
term b. 


6.4.4.2 Set b = 1. 
6.4.4.3 Compute the weight w; for each sample i: 
1 


w, = >= 
i 2 262 
Syt bs. 


(18) 


6.4.4.4 Calculate the weighted means of {X,} and {Y;} 
respectively: 


2 An (19) 


m yw, 
D >; 


6.4.4.5 Calculate the deviations from the weighted means: 


"XI 


*I 


x,=X,-X (20) 


y-7Y,-Y 


6.4.4.6 Calculate the three sums: 


A= X w?x,9 8; (21) 
B= > w3(x? 5h o— Ye Sa) (22) 
C= -X wixys?, (23) 


6.4.4.7 Calculate bo: 


(24) 


6.4.4.8 If lb — bol > .001 b, replace b with b, and go back to 
6.4.4.3, computing new values for the weights {w,}, X, Y, (x;), 
{y,}, and by. Otherwise, the iteration can be stopped, as further 
iteration will not produce meaningful improvement. Replace b 
with by and go to 6.4.4.9. 

6.4.4.9 Calculate the final weights {w;} as in 6.4.4.3. 


6.4.4.10 Calculate CSS, and a: 
CSS, — wy; uL (25) 
a-Y-bX (26) 


6.5 Conduct tests to select the most parsimonious bias 
correction class needed. 


6.5.1 The centered sum of squares for differences from each 
class of bias correction are used to select the most parsimoni- 
ous bias correction class that can improve the expected degree 
of agreement between the Y (the predicted Y-method result 
using X-method result) and the actual Y-method result on the 
same material. The classes of bias correction and the associated 
CSS as calculated earlier are repeated in the following table. 


Bias Correction Class CSS 


Class 0-no correction CSS, 
Class 1a-constant bias correction CSSi4 
Class 1b-proportional bias correction (when appropriate) CSS, 
Class 2-linear (proportional + constant bias correction ^ CSS; 
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6.5.2 To determine whether any bias correction (Class la, 
Ib, or 2 above) can significantly improve the expected agree- 
ment between the two methods, calculate the following ratio: 


(CSS, — CSS,)/2 
F= css — 2) wu 

6.5.2.1 Compare F to the upper 95" percentile of the F 
distribution with 2 and S-2 degrees of freedom for the 
numerator and denominator, respectively. 

6.5.2.2 If the calculated F is smaller, conclude that a bias 
correction of Class la, 1b, or 2 does not sufficiently improve 
the expected agreement between the two methods, relative to 
Class 0 (no bias correction). Proceed to 6.6. 

6.5.2.3 If the calculated F is larger, conclude that a correc- 
tion can improve the expected agreement between the two 
methods, and continue in 6.5.3. 

6.5.3 If the F-value calculated in 6.5.2 is larger than the 95" 
percentile of F, compute the following f-ratios: 


CSS, — CSS, 


h^ Noss(s - 2) (28) 
—— [css,- css, 
= N CSS,(S - 2) 


where, CSS, is the lesser of CSS;, or CSS,,, provided the 
latter is appropriate and has been calculated. 


6.5.3.1 Compare t, to the upper 97.5" percentile of the t 
distribution with S-2 degrees of freedom. 

6.5.3.2 If t, is larger, conclude that a bias correction of Class 
2 (proportional + constant correction) can improve the ex- 
pected agreement over that of a single term (constant or 
proportional) correction alone (Class 1). Proceed to 6.6. 

6.5.3.3 If t; is smaller than the t-percentile, compare f; to the 
same upper 97.5" percentile of the t distribution with (5-2) 
degrees of freedom. 

6.5.3.4 If t; is larger, conclude that a single term bias 
correction of Class 1 is preferred to a bias correction of Class 
2. Use the constant correction unless CSS;, is appropriate and 
is smaller than CSS,,. Proceed to 6.6. 

6.5.3.5 If t; is smaller, then neither f; nor t, is statistically 
significant. A bias correction of Class 2 is preferred over 
single-term (constant or proportional) correction of Class 1. 


6.6 Test for existence of sample-specific biases. 

6.6.1 Compare the CSS of the bias-correction class selected 
in 6.5 to the 95" percentile value of a chi-square distribution 
with v degrees of freedom. 


where: 

v = S for Class 0 (no bias) correction, 

v = S- for Class la or Class 1b (constant or proportional) 
correction, and 

v = $—2 for Class 2 (linear) correction. 


6.6.2 If the CSS is smaller than the chi-square percentile, it 
is reasonable to conclude that there are no sample-specific 
biases, that is, that there are no other sources of variation that 
are statistically observable above the measurement error. Per- 
form the Anderson-Darling (A-D) assessment on the residuals 
as per 6.7.2.2 and 6.7.2.3. If the outcome is not significant at 


the 5 % level, calculate the between methods reproducibility 
(Rxy) as per Eq 29 below. If the A-D assessment is significant, 
application of the practice is considered terminated with failure 
at this point, as the statistical evidence suggests that a single 
between-method reproducibility (Ry) cannot be found that is 
applicable to all materials covered by the intersecting scope of 
both test methods. It is reasonable to conclude that, at least for 
some materials, the test methods are not measuring the same 


property. 
[R2 D?RÀ 
Ryy = == (29) 


b = the coefficient of the appropriate bias correction. (For 
Class 0 and Class la bias corrections, b=1.) 


where: 


6.6.3 If the CSS is larger than the chi-square percentile (see 
6.6.1), there is strong evidence that biases between the methods 
have not been adequately corrected by the bias-corrections of 
6.4. In other words, the relative biases are not consistent across 
the S common samples of the round robins. The user may wish 
to investigate whether the biases can be attributed to other 
observable properties of the samples. Or he or she may wish to 
restrict attention to a smaller class of materials for the purpose 
of establishing a between methods reproducibility. Such inves- 
tigations are beyond the scope of this practice, as the issues 
typically are not statistical in nature. This practice does 
recommend investigating whether it is reasonable to treat the 
sample-specific biases as random effects, as described in 6.7. 


6.7 Treatment of Sample-Specific Relative Bias as a Vari- 
ance Component: 

6.7.1 If the CSS exceeds the 95'" percentile value of the 
appropriate chi-square distribution (see 6.6.1), there is strong 
evidence that sources other than measurement error are con- 
tributing towards the variation of the expected agreement 
between the two methods. In this practice, these sources are 
attributed to sample-specific effects (also known as matrix 
effects or method-material interactions). In some cases these 
sample-specific effects can be treated as random effects, and 
hence can be incorporated as an additional source of variation 
into a between methods reproducibility as described in this 
section. Note that, even when it is appropriate to treat these 
sample-specific effects as random, the additional variation may 
cause the between methods reproducibility to be far larger than 
the root mean square of the reproducibilities of the methods 
(Eq 29). 

6.7.2 Examine residuals to assess reasonableness of random 
effect assumption. 

6.7.2.1 Assess the reasonableness of the assumption that the 
sample-specific biases can be treated as random effects by 
examination of the distribution of the residuals. While there are 
numerous statistical tools available to perform this assessment, 
this practice recommends use of the Anderson-Darling normal- 
ity test, based on its simplicity and ease of use. It is not the 
intent of this practice to exclude other tools for this purpose. 

6.7.2.2 Let {Î,} be the Y-method values predicted from the 
corresponding X-method mean values {X;}, using the bias- 
correction selected in 6.5. The (standardized) residuals {e;} are 
given by: 
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£; — Vw(y,- y) (30) 


where: 
{w,;} = the appropriate weights from 6.4.1 — 6.4.4. 


6.7.2.3 Calculate the Anderson Darling (AD) statistic on the 
residuals {s;}. (Refer to Practice D6299 for guidance on 
calculation and interpretation of this statistic.) 

6.7.2.4 If the AD statistic is not significant at the 5 % 
significance level, conclude that the sample-specific relative 
bias may be treated as a variance component. Proceed to 6.7.3. 

6.7.2.5 If the AD statistic is significant, there is strong 
evidence that the sample-specific effects cannot be treated as 
random effects. Application of this practice is considered 
terminated at this point, as the statistical evidence suggests that 
a single between methods reproducibility (Ryy) cannot be 
found that is applicable to all materials covered by the 
intersecting scope of both test methods. It is reasonable to 
conclude that, at least for some materials, the test methods are 
not measuring the same property. Do NOT proceeed to 6.7.3. 


Note 12—It is possible that, by restricting the comparison to a narrower 
class of materials, a between methods reproducibility can be obtained (for 
that narrower class) that does not have sample-specific biases, or, has 
sample-specific biases that can be treated as a random effect. However, 
individual outlier materials should not be excluded from this study, 
after-the-fact, based on the statistics only, without other evidence that they 
clearly belong to a separate and identifiable class. 


6.7.3 Calculate the between methods reproducibility (Ryy) 
as follows: 


b?R2? R? 
R= | t. ‘| 


__ 2(1.96)? (CSS = S+k)S 


PRIR, 
(S- OX ir 


i 


where b and CSS are appropriate to the selected bias- 
correction, and k is O if the bias-correction is Class 0; kis 1 
if the bias correction is Class la or Class Ib; or k is 2 if the 
bias-correction is Class 2. 

Note 13—Eq 31 provides an estimate of the magnitude below which 
about 95 % of the differences are expected to fall, when one party uses the 
bias-corrected X-method while another party uses the Y-method, on 
materials similar to the round robin samples. Application of the methods 
to materials which are substantially different from these round robin 
materials may affect both the average bias and the variance of the random 
component. Laboratories which engage in routine substitution of one 
method for another are advised to periodically monitor the deviations 
between methods, as a regular part of their quality assurance program. 
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6.8 Construction of an interval using a single bias-corrected 
result from method X, and Ryy that may contain, about 95 % of 
the time, a single result from method Y, if the latter is 
conducted on the same sample. 

6.8.1 Let Y be a single bias-corrected X-method result. An 
interval bounded by Y + Ryp can be expected to contain a 
single corresponding Y-method result, obtained on the identical 
material about 95 % of the time. Here Ryp is computed from Eq 
29 or Eq 31, as appropriate, with Ry evaluated at Y = Y. 


7. Report 


7.] Upon completion of the calculations, it is recommended 
that the assessment findings be reported in the Precision and 
Bias section of the appropriate test method(s). In order for the 
assessment to be claimed to be compliant with this practice, the 
outcome, whether it is a success or fail, shall be reported. For 


successful outcome, it is mandatory to report the bias correc- 
tion equation, applicable test result ranges for the equation, and 
between-method reproducibility (Ry). In the event that one of 
the test methods assessed is cited as a referee test method, with 
the other test method being an alternative, this practice 
recommends the following naming convention, indicating the 
publication year for method D YYYY by the addition of suffix 
"-yy", and the publication year for method XXXX by the 


, 


addition of the suffix “-xx”: 


Referee Test Method designation: Test Method D YYYY-yy 
Alternative Test Method designation: Test Method D XXXX-xx 


7.2 The reporting format and information in this section 
(7.2) can be followed at the discretion of the user. The phrase 
"List sample types and property ranges" in this section refers to 
an overview summary of sample types used to conduct study. 
Due to the random nature of sample-specific biases, users are 
not required (nor is it always possible) to explain these biases 
by listing detailed characterizations of each of the samples. 
Report assessment findings in the Precision and Bias section of 
the appropriate test method, under a subsection titled 
*Between-Method Bias," as follows: 

Degree of Agreement between results by Test Method D XXXX and 
Test Method D YYYY-yy—Results on the same materials produced 
by Test Method D XXXX and Test Method D YYYY-yy have been 


assessed in accordance with procedures outlined in Practice D6708. 
The findings are: (report the findings here). 


7.2.1 To choose the appropriate findings, see Table 1. (A) 
represents passing, and (B) represents failure. Choose one of 
the following findings (Al, A2, A3, A4, BI, B2, B3, or B4). 

7.2.1.1 If the finding is A1, and Ry, estimated with at least 
30 degrees of freedom, is less than or equal to 1.2 published Ry, 
report the following for property range where Ry satisfies the 
aforementioned requirement. 


No bias-correction considered in Practice D6708 can further im- 
prove the agreement between results from Test Method D XXXX 
and Test Method D YYYY-yy for the materials studied (reference 
Research Report ZZZZ). For applications where Test Method X is 
used as an alternative to Test Method Y, results from Test Method 
D XXXX and Test Method D YYYY-yy may be considered to be sta- 
tistically indistinguishable, for sample types and property ranges 
listed below. No sample-specific bias, as defined in Practice D6708, 
was observed for the materials studied. 


Sample types and property range where results from method 

D XXXX and DYYYY-yy may be considered to be statistically indis- 
tinguishable are: (list applicable sample types and property ranges 
here). 


7.2.1.2 If the finding is A1, for property range where Ry 

does not meet the requirement listed above, report the follow- 
ing: 

No bias-correction considered in Practice D6708 can further improve 

the agreement between results from Test Method D XXXX and Test 

Method D YYYY-yy for the materials studied (reference Research 

Report ZZZZ). No sample-specific bias, as defined in Practice 

D6708, was observed for the materials and property range listed 


below. (List sample types and property ranges for above findings 
here.) 


Differences between results from Test Method D XXXX and Test 
Method D YYYY-yy, for the sample types and property ranges 
studied, are expected to exceed the following between methods re- 
producibility (Rxy), as defined in Practice D6708, about 5 % of the 
time. (Report the between methods reproducibility here.) 


7.2.1.3 If the finding is A2, report the following: 
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TABLE 1 Summary of Findings^ 


A B C 


D1 D2 D3 Assessment 
Outcome 


Is there adequate variation Is there adequate Will a scaling/bias correction 
in the property level of correlation significantly improve the 


the sample set relative to between the test results agreement between the results 


from Test Method XXXX from Test Method XXXX 
and Test Method YYYY? and Test Method YYYY 


reproducibilities? over and above their combined 


reproducibilities? 


Are there sample- If yes to (D1), If no to (D1), 
specific biases? can these biases are the residuals 


be treated asa randomly 
random effect? scattered? 


Yes No 
Yes No 
Yes No 
Yes No 
Yes Yes 
Yes Yes 
Yes Yes 
Yes Yes 
Yes N/A 
No N/A 


^ Boldfaced type indicates reason for failure. 


No bias-correction considered in Practice D6708 can further improve 
the agreement between results from Test Method D XXXX and Test 
Method D YYYY-yy for the material types and property range listed 
below (reference Research Report ZZZZ). Sample-specific bias, as 
defined in Practice D6708, was observed for some samples. (List 
sample types and property ranges for above findings here.) 


Differences between results from Test Method D XXXX and Test 
Method D YYYY-yy, for the sample types and property ranges studied, 
are expected to exceed the following between methods reproducibility 
(Rxy), as defined in Practice D6708, about 5 % of the time. (Report the 
between methods reproducibility here.) 


As a consequence of sample-specific biases, Hy, may exceed the 
reproducibility for Test Method D XXXX (Rx), or reproducibility for 
Test Method D YYYY-yy (Ry), or both. Users intending to use Test 
Method D XXXX as a predictor of Test Method D YYYY-yy, or vice 
versa, are advised to assess the required degree of prediction 
agreement relative to the estimated Ry, to determine the fitness-for- 
use of the prediction. 


7.2.1.4 If the finding is A3, and Ry estimated with at least 30 
degrees of freedom, is less than or equal to 1.2 published Ry, 
report the following for property range where Ry satisfies the 
aforementioned requirement: 


The degree of agreement between results from Test Method 

D XXXX and Test Method D YYYY-yy can be further improved by 
applying correction equation C1 as listed below (reference Research 
Report ZZZZ). For applications where Test Method X is used as an 
alternative to Test Method Y, bias-corrected results from Test Method 
D XXXX (as per correction equation C1) and results from Test 
Method DYYYY-yy may be considered to be statistically 
indistinguishable, for sample types and property ranges listed below. 
No sample-specific bias, as defined in Practice D6708, was ob- 
served after the bias-correction for the materials studied. 


Sample types and property range where bias-corrected results from 
method D XXXX and results from method DYYYY-yy may be consid- 
ered to be statistically indistinguishable are: (list applicable sample 
types and property ranges here). 
7.2.1.5 If the finding is A3, for property range where Ry 
does not meet the requirement listed above, report the follow- 


ing: 


Pass (A1) 
Fail (B4) 
Pass (A2) 
Pass (A3) 
Fail (B4) 
Pass (A4) 


The degree of agreement between results from Test Method 

D XXXX and Test Method D YYYY-yy, can be further improved by 
applying correction equation C1 as listed below (reference Research 
Report ZZZZ). No sample-specific bias, as defined in Practice 
D6708, was observed after the bias-correction for the materials and 
property range listed below. 

(List sample types and property ranges for above findings here.) 


Correction Equation C1: 


bias-corrected X = predicted Y = bX + a; b = xxx; a = uuu 


where: 

X = result obtained by Test Method D XXXX, 

bias-corrected X = predicted Y, 

predicted Y = result that would have been obtained by Test Method 
D YYYY-yy on the same sample, and 

b,a = parameter estimates for a linear correction as defined 


in this practice. 


Differences between bias-corrected results from Test Method 

D XXXX and Test Method D YYYY-yy, for the sample types and 
property ranges studied, are expected to exceed the following 
between methods reproducibility (Ry), as defined in Practice 
D6708, about 5 % of the time. (Report the between methods repro- 
ducibility here.) 


7.2.1.6 If the finding is A4, report the following: 


The degree of agreement between results from Test Method 

D XXXX and Test Method D YYYY-yy can be further improved by 
applying correction equation C1 as listed below (reference Research 
Report ZZZZ). Sample-specific bias, as defined in Practice D6708, 
was observed for some samples after applying the bias-correction, 
for the material types and property range listed below. (List sample 
types and property ranges for above findings here.) 


Correction Equation C1: 


bias-corrected X = predicted Y = bX + a; b = xxx; a = uuu 


where: 


X 


bias-corrected X 


result by Test Method D XXXX, 
predicted Y, 
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predicted Y = result that would have been obtained by Test Method 
D YYYY-yy on the same sample, and 
b,a = parameter estimates for a linear correction as defined 


in this practice. 


Differences between bias-corrected results from Test Method 

D XXXX and Test Method D YYYY-yy, for the sample types and 
property ranges studied, are expected to exceed the following 
between methods reproducibility (Ry), as defined in Practice 
D6708, about 5 ?6 of the time. (Report the between methods repro- 
ducibility here.) 


As a consequence of sample-specific biases, Ryy may exceed the 
reproducibility for Test Method D XXXX (Rx), or the reproducibility 
for Test Method D YYYY-yy (Ry), or both. Users intending to use 
Test Method D XXXX as a predictor of Test Method D YYYY-yy, or 
vice versa, are advised to assess the required degree of prediction 
agreement relative to the estimated Ryy to determine the fitness-for- 
use of the prediction. 


7.2.1.7 If the finding is B1, report the following: 


8.1.1 The statistical treatment of data from the PT program 
should be functionally equivalent to techniques used by ASTM 
subcommittee D02.01 National Exchange Group (NEG), or by 
ASTM subcommittee D02.CS92. 


8.1.2 The TPI Industry (see Guide D7372) for the PT data 
used to carry out this validation should be greater than 1.2. 


8.1.3 The validation should be performed using the follow- 
ing difference statistic D, or other statistically equivalent 
techniques. For a single value D, the assessment findings are 
considered validated if the absolute value is less than or equal 
to 3. For a control chart, the D values are expected to randomly 
vary on either side of zero. Sustained values of D on either the 
positive or negative side of zero should trigger activities for a 
reassessment. 


Test material property differences can not be reliably distinguished D= [y = (b E a) (32) 
by either Test Method D XXXX, or Test Method D YYYY-yy, or both. \/SER+ SE? 
7.2.1.8 If the finding is B2, report the following: mons: 
There is an insufficient degree of agreement (correlation) between = ` 
Test Method D XXXX and Test Method D YYYY-yy. y = average of Y method from ILS of the same 
ME : terial 
7.2.1.9 If the finding is B3, report the following: MET ? 
e P e X = average of X method from ILS of the same 
There are unpredictable sample-specific biases for some samples. material 
(Insert additional information regarding the sources of the sample- , a 
specific bias here, if any are known.) b, a = outcome from the bias assessment; for outcome 
7.2.1.10 Tf the finding is B4, report the following: Al, b=1,a=0, 
sn MEIN ica es SE, = [standard error of y] = 0.36 XR, / VL, 
ere IS unpredictable between methods reproducibility. SEg E [standard error of bias-corrected x] = 
8. Validation of Assessment Findings Using Proficiency ,0.36x Vb XRy/ VL, 
Testing (PT) Program Data Ry Ry, = published reproducibility for methods X and Y, 
8.1 The assessment findings as reported should be validated and À 
: . : Ly, Ly = number of non-rejected results used to calculate 
using PT data (if available) that are not used for the assessment. the ILS f ibods Xand Y where th 
If these data are available on a regular basis, the validation Ws UT ge Pd He i i aa ? ur si 
should also be carried out on a regular basis using the /EWMA n o. d nte SE aoe ee ee 
control chart techniques described in Practice D6299. TEPORE SU Sa enn ny 
APPENDIXES 


(Nonmandatory Information) 


X1. STATISTICAL BASIS 


X1.1 Adequacy of Round Robin Sample Set 


X1.1.1 In order to obtain a usable comparison between two 
test methods, it is critical that the samples are sufficiently 
varied that they can be distinguished from one another (or at 
least so that some can be distinguished from some others) using 
the test methods in question. The most straightforward test 
involves the total (weighted) sum of squares, which, for the X 


measurement is: 
x-y 
TSS = sí i ) 
7 Sxi 


(X1.1) 


where: 
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ES 


the mean of the mean X-results weighted by the reciprocal 
of the squares of the standard errors (5,;). 


X (X1.2) 


X1.1.2 If the S samples were all the same material, if the 
{X;} were distributed normally, and if the standard errors were 
known exactly, then TSSy would have a chi-square distribution 
with S-1 degrees of freedom. In practice, the {sy,;} are not 
known exactly, but our situation approximates one in which 
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TSS y/(S—1) would have an F distribution, with S-1 degrees of 
freedom in the numerator and v degrees of freedom in the 
denominator, where v is the degrees of freedom associated with 
the reproducibility estimate. 


X1.1.3 If the materials were not all the same, then we would 
expect TSS, /(S—1) to be larger than an F-distributed variable. 
For round robins, hopefully samples will have been selected 
with a range of property values, so TSS,/(S-1) will be very 
much larger than the 95" percentile of F. If we come even 
close to failing this test, or the analogous test using the 
Y-method data, then the best course of action would be to start 
over with a more variable set of samples. 


X1.2 Quantifying the Closeness of Agreement Between 
Two Test Methods 


X1.2.1 Suppose we use a calibration function, f(X), to 
estimate (or predict) the property as measured by a reference 
Y-method. For the round robin samples, the mean result by the 
reference method, Y, can be compared to f(X) and used to 
quantify the closeness of agreement. In classical (weighted) 
regression, the weighted residual sum of squares, 

Y, — f(x)? 
y ; A i) 


sl (X1.3) 


is used as a measure of the closeness of agreement. If 
competing calibration functions are under consideration, re- 
gression methods — classical least squares — suggest we should 
prefer the one with smallest sum of squares (X1.1). But this 
overlooks the fact that the {X;} are not the true values of the 
property as measured by the alternative method, but only 
estimates of those values, and they also involve random error. 
Let {h;} represent the true, unknown values of the property as 
measured by the reference method. The (/;) will be estimated 
from the data. Both Y, and f(X;) estimate /;, which is not 
known. Y, has variance sy; , and f(X;) has variance approxi- 
mately f? (Xj)s;^, where f '(X;) is the derivative of f at X;. So 
an alternative measure of closeness is: 
(Y, — hi) , VOX) ni hj)? 
min nd i 7 2 | 
S5 Tg (Xj)ss; 
X1.2.2 This sum can be minimized term by term. The value 


of h, that minimizes the i" term—and the value that is our best 
estimate of the true value—is: 


(X1.4) 


«of UR sx st (X) 
h,= 3 X1.5 
' Sf (X)s% : 
and the minimized sum of e is: 
CSS = >! -Ax Uia (X1.6) 


F "X)ss; 
X1.2.3 Compare Eq X1.4 to Eq X1.1, and note that the only 
difference is that, in place of the variance of Y; in the 


L 


denominator of each term, Eq X1.4 has the variance of 


Y iX; 


X1.3 Properties of the Closeness Metric 


X1.3.1 Distributional Properties: 
X1.3.1.1 If the (X;) and (Y;] are independent normal, if the 
standard errors are known exactly, if fis linear (so that (f(X;)) 
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are normal), and if E[Y] = E[f(Xj)] for all i, where E[Y] 
represents the mean or expected value of distribution of Y, then 
CSS has a chi-square distribution. The degrees of freedom 
associated with CSS is S, the number of materials (samples) 
common to the round robins. This may be seen by the fact that 
Eq X1.2 has 2S terms, but S parameters {h;} are fitted by 
least-squares. 

X1.3.1.2 When E[Y;], # EX], it may be because the 
calibration function, f, is not known exactly. If f belongs to a 
specific class of functions — linear functions, for example — 
then the unknown parameters of f (for example, a and b if f(X) 
=a + b X) may be estimated by minimizing Eq X1.4 with 
respect to these parameters. In this case, CSS would be 
distributed as chi-square with S — k degrees of freedom. 

X1.3.1.3 But if CSS is evaluated using an incorrect calibra- 
tion equation, or by minimizing over a class of equations that 
does not contain the true calibration equation, or if there are 
sample-specific biases that cannot be accounted for by any 
calibration function, then CSS can be expected to be larger 
than a chi-square variable. The last of these three situations is 
worth special consideration. In the event that two or more 
different materials may have the same true value, E[Y], as 
measured by one method, but different true values, E[X], as 
measured by the other method, then no calibration equation can 
completely account for the differences between the two meth- 
ods. Such sample-specific biases can be the dominant contribu- 
tor to CSS. In fact, it almost certainly will be the dominant 
factor when (X;) and (Y;) are very precise, that is, when the 
materials are measured by sufficiently large numbers of labs. In 
such cases, note that an A; of Eq X1.3 will approximate neither 
E[Y;] nor E[X;], but instead approximates an average of the 
two, an average that is weighted towards the more precise of Y; 
and X;. 

X1.3.1.4 When the standard errors are not known, but 
approximately proportional to the same standard deviation 
estimate, then an F distribution may be a reasonable approxi- 
mation to the distribution of CSS/S, or CSS/(S — Kk), as 
appropriate. 


X1.3.2 Symmetry in X and Y: 

X1.3.2.1 Note that, if fis linear, then Eq X1.4 is indepen- 
dent of which method is considered the reference method. If 
instead of predicting Y with AX), we wish to predict X with f~! 
(Y), then f (X;)-b-1/f (Y), and Y; — KX;)-b (f. (Y)-X;), so b? 
cancels from the top and bottom of each term and Eq X1.4 is 
unchanged. 

X1.3.2.2 This symmetry property is not shared by classical 
regression — the slope obtained from regressing Y on X is 
always smaller than the reciprocal of the slope from regressing 
X on Y. The method developed in this appendix is a weighted 
version of what is known as regression with errors in both 
variables, which is discussed in many texts.* For non-linear f, 
the symmetry is lost. But for smooth f, the two equalities above 
are almost still true. 


X1.3.3 An Explanation of Eq 31 in 6.7.3 of Practice D6708: 
X1.3.3.1 Recall that: 


^ Mandel, John, Evaluation and Control of Measurements, Marcel Dekker, 1991, 
Sec. 5.5. 
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(y, = 2 
(zt (X1.7) 
SyutS 
where 
Y. = a+ bX, 
Sy = the standard error of Y, and 


bS,; = the standard error of Y. 


Presuming S, and Sy to be known constants, then, in the 
absence of sample-specific biases, CSS should have a chi- 
square distribution, with degrees of freedom depending upon 
the number of samples and the number of parameters (a and/or 
b) estimated from the data. 

X1.3.3.2 The expected value of CSS is just the degrees of 
freedom, v. If CSS is not significantly larger than v, that is, if 
it is less than the 95™ percentile of the chi-square distribution, 
then we may conclude that there are no sample-specific biases. 
Otherwise, the amount by which CSS exceeds v is attributed to 
sample specific bias. Some appropriate amount of this differ- 
ence has to be added to the square of the between-method 
reproducibility. 

X1.3.3.3 If E[Y,]=n, and E[ ?,]-,, then the bias specific to 
the i sample is y,—n, and: 


^2 
ely, - ¥,) 
E[CSS] = x——— —À37—- 
Sats 
EU -m Y, + n')'+2(u, - n)E(Y, - n - Y; n) (n - m) 
shts; 
(X1.8) 
X1.3.3.4 Since E(v, —y,- Y, n,) =0, we have: 
ElY, = u,- Y xw) = ne 
E[CSs] = x: (v, LET v) ux n) _ 
Sy +S), SytS; 
ag ni)? 
SyutS 
(X1.9) 
or 
(ui =)" _ press] € (X1.10) 
Shut Sg, 


(Eq X1.10 in this case isn't exact since S;,=bS,,, and b is 
random, so the expectation operator does not push through as 
shown. However, it is satisfactory as an approximation.) 


X1.3.3.5 The expectation operator above is appropriate 
under the assumption that (u;) and (7;) are fixed constants. If 
instead we assume that they are random (that is, they vary from 
one material to another in a manner that we may consider to be 
random), then Eq X1.10 holds for (u, — ,) replaced by its 
conditional expectation given material i, and E[CSS] replaced 
by E[CSS | sample materials]. 


X1.3.3.6 For estimation, we can exchange expectations on 


12 


5 


(u; — n) 
Shuts, 
CSS— v (when CSS is significantly larger than v), and to take it 
Lu; = n7] 


y . E 
one step further, this estimates X 5 
L+S, 


expectation is unconditional (that is, E[u,|=E[n,], and E[(u, 
— y,)?] depends on the i” material only through its level, E[u;]. 


either side of this equation. We can estimate X by 


, Where now the 


X1.3.3.7 In the absence of sample-specific bias, the 
between-method reproducibility is just the root mean square of 
the reproducibilities of the two methods: 

RÈ 2 2p2 2 
R= 1m. rei (X1.11) 
which is Eq 29 of the practice. But when sample-specific 
biases are present, then the excess variation needs to be ac- 
counted for: 


2 R2 2 

Ryy = ne sit + (1.96)2E[ (a = ny] (X112) 

X1.3.3.8 Like Ry and Ry, E[(u — n)?] may depend on the 
level of concentration. There really is not enough information 
in a limited data set to allow us to estimate this relationship, so 
we need to make some assumptions. It seems reasonable that 
E[(u — n)?] should grow in a manner similar to Ry? or Ry’, or 
R?+R?, or pRi+ qR? for some choice of p and q. Eq 31 of the 
practice uses what seems to be a reasonable assumption, that is: 
E[(u — n?] is proportional to R}+R}~b?R}+R}. So E[(u 
— n] varies with level (concentration, etc.) proportionally 
with 5?R2- R3. (Fair to both methods.) E[(u — n)?]-x(P? R2 
+ R2). 


X1.3.3.9 Then from Eq X1.12, 


BRE Re 
Ryy = Be — R? + R?) 


EE. (X1.13) 

From Eq X1.10, CSS—v is an estimate of 

E[(u — ny] |. PRi-Ri 
= b^Si, t S, ERA C 
so K can be estimated by 
| .C88- v xids 
BP inan 
b’ Skit S5, 


This approximation results in 


DRL R} 2(1.96)?(CSS — 
Ry 240g. MUSEI. V Pw om 
z 2 2 5 PRAT RY 
b^Sy,* Sy; 
(X1.16) 
1 (1.96)?(CSS — v) 
Ryy= | (b? R} + R?)| 5 + 2p (X1.17) 
wc Ue 5 pere 
S375; 
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TABLE X2.1 Aromatics by Test Method D5580 


Fuel 

Laboratory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
1 23.76 26.34 25.14 22.76 29.10 14.83 19.77 42.61 21.77 19.85 37.40 31.53 16.48 19.26 13.26 
24.22 29.16 19.81 12.99 
2 24.46 25.88 25.72 22.59 29.08 15.68 19.92 41.89 21.68 19.97 37.38 31.35 16.55 19.48 13.25 
24.59 25.94 25.76 22.57 29.07 15.64 19.82 42.10 22.00 20.02 37.09 31.29 16.58 19.63 13.53 
3 24.50 25.36 26.28 22.87 29.28 15.71 20.12 42.90 21.93 20.02 38.05 31.63 16.72 19.72 13.50 
24.54 25.17 26.26 22.65 29.33 15.76 20.01 42.90 21.91 20.14 38.07 31.80 16.60 19.82 13.54 
4 24.74 25.23 25.72 22.82 29.31 15.51 20.35 42.52 22.24 20.32 37.03 31.77 16.50 20.03 13.63 
24.90 25.19 25.65 22.68 29.21 15.48 19.99 42.38 22.14 20.01 37.44 31.80 16.45 19.84 13.69 
5 24.64 26.01 25.92 22.17 30.50 14.78 19.37 43.71 22.85 20.43 37.80 31.09 16.27 20.85 13.85 
24.70 25.87 25.87 22.20 30.69 14.88 19.66 44.00 23.50 20.30 37.84 31.31 16.55 21.01 13.85 
6 24.93 26.28 26.07 22.59 30.08 15.91 20.30 43.08 22.24 20.26 38.28 32.60 16.70 19.94 13.67 
25.13 26.72 26.08 22.90 30.10 16.16 20.49 43.27 22.56 20.58 38.54 32.72 16.97 19.94 13.89 
7 24.37 25.40 25.66 21.93 29.11 15.30 19.33 42.08 21.88 19.79 36.28 30.60 15.87 19.30 12.91 
24.36 25.36 25.72 21.97 29.18 15.10 19.32 41.77 21.98 19.71 37.19 30.65 15.91 19.23 12.91 
Mean 24.56 25.79 25.78 22.53 29.51 15.40 19.87 42.70 22.17 20.09 37.56 31.55 16.47 19.81 13.46 
Standard Er- 0.177 0.181 0.181 0.170 0.193 0.140 0.159 0.234 0.168 0.160 0.219 0.201 0.145 0.159 0.131 


ror 


X2. A WORKED EXAMPLE 


X2.1 Example Data 


X2.1.1 The data in Tables X2.1 and X2.2 are from a round 
robin for aromatics in gasoline conducted by seven labs. 
Fifteen (S = 15) fuels were tested by two methods. Table X2.1 
are the results from Test Method D5580, a gas chromatography 
(GC) method, while Table X2.2 contains the results from Test 
Method D5769, gas chromatography/mass spectrometry (GC/ 
MS). No data have been removed as outliers, but some repeat 
results are missing for Test Method D5580. For purposes of 
this example, designate Test Methods D5580 and D5769 as the 
X and Y methods, respectively. 


Note X2.1—All equations referenced are from this standard except as 
noted. 


X2.1.2 The repeatabilities and reproducibilities were esti- 
mated from the round robins in accordance with Practice 
D6300. These are shown in Table X2.3. The degrees of 
freedom are also from the precision analysis. The standard 
deviations associated with repeatability and reproducibility are 
obtained by dividing the precision estimates by t o5 4/2, where 
to7s is the 97.5" percentile of the ¢-distribution with the 
applicable number of degrees of freedom. 


X2.2 Calculation of the Mean Results and Standard Er- 
rors 


X2.2.1 Both round robins included seven participants, and 
all participants measured every sample, so Ly; = Ly; = 7 for all 
i. As an example, for the second sample from method X, X, is 
calculated using (Eq 1) as follows: 


. 1(2634 25.88-25.94 25.364257 | 25.4-- 25.36 
1 2 2 
(X2.1) 


2 


2 


7 


1 
= 7(26.34+ 25.91+25.265+25.21+25.94+ 26.5+ 25.38) = 25.79 


13 


X2.2.2 Note that this is not the same as the average of the 
thirteen X-method results on this sample. The remaining X; and 
Y; are computed in a similar fashion. 


X2.2.3 The standard error of each mean is calculated using 
Eq 3. Again for the second sample X-method results, the n; are 
all equal to 2, except n, = 1, so 


4 1 . "i 
= z and sx = 4] | 0964? — 0.0290°{ =} | /25.79 


= 0.181. 


1 1 


Ly Ny; 


(X2.2) 


X2.2.4 The means and standard errors for each fuel by both 
methods are found at the bottoms of their respective tables 
(Tables X2.1 and X2.2). 


X2.3 Calculate the Total Variation Sum of Squares 


X2.3.1 Table X2.4 demonstrates the application of Eq 4 and 
5 to obtain the total sum of squares for the Y-method means. 
The weighted mean, Y, is found to be 3333.81/186.8 = 17.85. 
TSSy = 6564.8. We compare 6564.8/14 = 469 to the 95" 
percentile of the F distribution with 14 and 9 degrees of 
freedom for the numerator and denominator, respectively. The 
F percentile is 3.03. Hence, we conclude TSSy is highly 
statistically significant. Similarly, a high degree of significance 
is also found for TSSy. 


X2.4 Test Whether the Methods are Sufficiently Corre- 
lated 
X2.4.] From Eq 6 compute r: 
Yw(x, T x)G, = y) 
pm 
Vine = x) Ew, = y 
E 5069.0 
V/5228.3 x 5033.6 


(X2.3) 


— 0.9881 
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TABLE X2.2 Aromatics by Test Method D5769 


Fuel 

Laboratory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
1 21.33 21.37 22.21 20.90 26.19 10.88 15.88 38.58 18.66 16.81 33.14 27.87 14.74 17.72 11.78 
22.01 21.12 21.99 20.98 25.88 10.93 16.07 38.39 18.41 17.21 33.76 28.39 14.77 17.68 12.12 
2 21.70 21.32 22.20 20.79 26.85 11.60 16.26 40.33 19.29 17.41 34.32 29.28 14.99 18.10 12.31 
21.79 21.15 22.60 20.69 26.57 11.84 16.25 38.86 18.79 17.28 33.99 28.48 14.86 18.13 12.24 
3 24.09 23.36 24.71 22.40 27.99 12.45 17.31 41.40 20.65 19.83 35.18 29.96 16.24 19.81 12.94 
24.32 23.57 24.93 22.26 28.08 12.31 17.26 41.36 20.88 18.94 36.35 29.82 16.48 19.42 12.81 
4 23.43 22.59 24.15 21.55 27.58 12.23 17.09 41.04 20.14 18.53 35.80 30.28 15.39 18.23 12.52 
23.08 22.54 23.99 21.61 27.50 12.36 17.15 41.11 20.37 18.46 35.98 30.12 15.483 18.23 12.59 
5 23.63 22.65 24.54 21.26 28.10 12.52 17.49 41.79 20.47 18.73 35.67 30.01 15.74 18.99 12.31 
24.33 22.69 24.88 22.36 28.24 12.48 17.26 40.71 20.29 18.31 35.84 30.03 16.08 18.73 12.30 
6 22.38 20.43 22.70 20.13 26.34 11.27 15.72 38.89 18.74 17.13 3429 27.73 14.97 18.56 12.17 
22.53 20.40 22.86 20.39 26.44 11.24 15.54 39.13 18.71 17.26 34.74 27.85 15.00 18.59 12.05 
7 22.84 21.79 22.90 20.85 27.10 11.33 16.36 40.88 19.50 17.76 34.93 28.80 15.05 17.82 12.01 
22.72 21.76 23.32 20.25 26.47 11.33 16.79 40.27 19.42 17.50 34.71 29.11 14.87 17.56 11.99 
Mean 22.87 21.91 23.43 21.17 27.10 11.77 16.60 40.20 19.59 17.94 34.91 29.12 15.32 18.40 12.30 
Standard Er- 0.345 0.330 0.353 0.319 0.408 0.177 0.250 0.606 0.295 0.270 0.526 0.439 0.231 0.277 0.185 


ror 


TABLE X2.3 Precision Estimates and Associated Standard Deviations“ 


Precision Estimates Degrees of Freedom t (.975) Standard Deviations 
94 1.986 
ry = 0.0831 \/X S,x = 0.0296 \/X 
28 2.048 
Ry = 0.2792 \/X Spy = 0.0964 \/X 
ry = 0.0292 Y 105 1.983 S,y = 0.0104 Y 
Ry = 0.1292 Y 9 2.262 Spy = 0.0404 Y 


^ This interlaboratory study did not meet the minimum degrees of freedom requirement (30) as recommended in Practice D6300. The low degrees of freedom for Ry and 
Hy suggest the need for further inter-laboratory standardization, and the latter could be a contributing factor towards the sample-specific biases observed. 


TABLE X2.4 Total Variation Sum of Squares for Y-Method 


i Y; Sy; T/sy? Y; Isy? (Yi - Ysy? 
1 22.87 0.345 8.42 192.57 212.48 
2 21.91 0.330 9.17 201.01 151.48 
3 23.43 0.353 8.02 187.99 249.90 
4 21.17 0.319 9.82 208.01 108.70 
5 27.10 0.408 6.00 162.54 513.12 
6 11.77 0.177 31.80 374.21 1174.31 
7 16.60 0.250 15.98 265.27 24.75 
8 40.20 0.606 2.73 109.57 1361.51 
9 19.59 0.295 11.47 224.77 35.04 
10 17.94 0.270 13.68 245.49 0.12 
11 34.91 0.526 3.61 126.17 1052.00 
12 29.12 0.439 5.19 151.22 660.32 
13 15.32 0.231 18.76 287.42 119.47 
14 18.40 0.277 13.01 239.38 3.95 
15 12.30 0.185 29.13 358.18 897.59 
Sum 186.80 3333.81 6564.75 
Wt Avg 17.85 
X = 20.62 and Y = 18.36 are calculated in Table X2.6. Xi, Yi, X2.4.2 The 99" percentile of the F distribution, with 1 and 
and w; are shown again in Table X2.5, where the components 13 degrees of freedom, is 9.07. As the computed F is (very 
of r are calculated. much) larger than 9.07, the methods are sufficiently correlated. 
_ 13X0.9881? _ — 
~ 1-0988P - : een 
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TABLE X2.5 Computing r 


i X; Sxi Y; Sy; w; w; (X, - XP w;(Y;- YP — w;(X; - X) (Y; - Y) 
1 24.56 0.177 22.87 0.345 6.67 103.7 135.8 118.7 
2 25.79 0.181 21.91 0.330 7.05 188.6 88.9 129.5 
3 25.78 0.181 23.43 0.353 6.35 169.5 163.1 166.3 
4 22.53 0.170 21.17 0.319 7.66 28.1 60.7 41.3 
5 29.51 0.193 27.10 0.408 4.90 387.7 374.0 380.8 
6 15.40 0.140 11.77 0.177 19.56 533.3 849.2 672.9 
7 19.87 0.159 16.60 0.250 11.37 6.3 35.0 14.8 
8 42.70 0.234 40.20 0.606 2.37 1157.3 1131.8 1144.5 
9 22.17 0.168 19.59 0.295 8.66 21.0 13.2 16.7 
10 20.09 0.160 17.94 0.270 10.15 2.8 1.8 2.2 
11 37.56 0.219 34.91 0.526 3.08 884.0 843.7 863.6 
12 31.55 0.201 29.12 0.439 4.29 513.0 497.7 505.3 
13 16.47 0.145 15.32 0.231 13.45 230.9 123.9 169.1 
14 19.81 0.159 18.40 0.277 9.79 6.4 0.0 -0.3 
15 13.46 0.131 12.30 0.185 19.45 995.5 714.8 843.5 
Sum 5228.3 5033.6 5069.0 
TABLE X2.6 CSS, and CSS,, 

i Y-X; Wi w(Y;- X)? WX; W;Y; w(Y;- X-Y + Xy 

1 -1.69 6.67 19.1 163.8 152.5 2.16 

2 -3.88 7.05 106.2 181.7 154.4 18.52 

3 -2.36 6.35 35.3 163.7 148.7 0.06 

4 -1.36 7.66 14.2 172.6 162.2 6.21 

5 -2.42 4.90 28.7 144.6 132.7 0.12 

6 -3.63 19.56 257.4 301.2 230.3 36.57 

7 -3.27 11.37 121.7 225.9 188.7 11.63 

8 -2.51 2.37 14.9 101.3 95.4 0.14 

9 -2.58 8.66 57.6 192.0 169.7 0.88 

10 -2.15 10.15 46.8 203.8 182.0 0.13 

11 -2.65 3.08 21.7 115.7 107.5 0.47 

12 -2.42 4.29 25.2 135.5 125.1 0.12 

13 -1.15 13.45 17.8 221.6 206.1 16.54 

14 -1.41 9.79 19.4 193.9 180.1 7.08 

15 -147 19.45 26.5 261.9 239.2 23.20 

Sum 134.80 CS8,-812.46 2779.2 2474.5 CSS,, = 123.86 

Wt Avg 20.62 18.36 


X2.5 Calculate the Centered Sums of Squares (CSS) 


X2.5.1 Class O—No correction. The first three columns of 


Table X2.6 display the computations from Eq 8 and Eq 9. As 


shown in the next-to-last line in the table, CSS, turns out to be 


812.46. 


X2.5.2 Class 1a—Constant correction. Table X2.6 contains 
these computations, also. Note that Y; is smaller than X; for all 
samples, so it is not surprising that CSS, is quite a bit smaller 


than CSSp. a = Y - X = 18.36 - 20.62 = -2.26. 


X2.5.3 Class 1b—Proportional correction. 

X2.5.3.1 Aromatics concentration having a true zero, and as 
max(Y;) = 40.2 > 23.54 = 2 min(Y)), it is appropriate to also 
consider a proportional correction. Table X2.7 shows the 
computations for the first two iterations. Starting with b = 1, the 
first iteration proceeds using weights {w;} from Table X2.6. 


A = X, w?X Y s2, = 13061.7 (X2.5) 

B = X w?(X? s2, — Y? s2,) = 36172.7 (X2.6) 
C = —)w2X,Y,s2, = —43026.6 (X2.7) 
b= TBH VBR 4AC _ 0.8982 (X2.8) 


2A 


X2.5.3.2 As lb — bj] = 0.1018 > .001 b, we shall iterate. 
X2.5.3.3 From the Second Iteration: 
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- —39935.0-- V/39935.0? — 4 x 114369 x (— 47095.7) _ ; 
o 2X114369 0,8313 


(X2.9) 


X2.5.3.4 Now Ib — bj] = 0.0009 < .001 b and the iteration 
may stop. The final computation of {w,}, and CSS, = 158.79, 
are shown in the last two columns of Table X2.7. 


X2.5.4 Class 2—Linear correction. 

X2.5.4.1 Tables X2.8 and X2.9 demonstrate two iterations 
of the algorithm for fitting the linear model. Starting with b = 
1, the first iteration proceeds as shown in Tables X2.6-X2.8. 


A = X w?x ys}, = 11274 (X2.10) 

B = Y w?(x? s2, — y? s2,) = 2935.4 (X2.11) 

C = -Y w?x ys}, = —3941.6 (X2.12) 
—B-« M B? — 4AC 

b= 24 = 0.9765 (X2.13) 


X2.5.4.2 As lb — bj] = 0.0235 > .001 b, we shall iterate. 
X2.5.4.3 From the second iteration, as shown in Table X2.9: 


= 1153.6 V 1153.6? — 4 X 2991.1 x (— 4021.8) 
0 2X 2991.1 


= 0.9767 


(X2.14) 
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TABLE X2.7 Iterating Class 1b 


2 P 2 = a 2 
| — wfob-i WAX) Yen? "OSHT wüxYsP — "UC WAX Visa? "OSSA welwese "iao — w(Y,- bX} 
1 6.67 778.7 2462.3 2968.0 6.95 845.2 2672.7 3221.6 6.95 4.63 
2 7.05 923.9 2814.3 3057.5 7.38 1012.7 3084.8 3351.4 7.38 10.69 
3 6.35 801.4 2610.0 3032.9 6.61 870.3 2834.4 3293.6 6.62 0.54 
4 7.66 805.3 2274.9 2848.7 8.00 878.5 2481.9 3107.8 8.00 6.99 
5 4.90 718.5 2824.9 3198.9 5.08 772.3 3036.4 3438.4 5.08 1.84 
6 19.56 1363.8 1811.4 2181.5 21.14 1591.6 2114.1 2546.0 21.15 81.86 
7 11.37 1082.2 2290.6 2668.7 12.04 1213.8 2569.2 2993.3 12.05 17.19 
8 2.37 527.1 3270.9 3546.1 2.43 554.5 3440.9 3730.4 2.43 8.41 
9 8.66 922.3 2398.3 2839.6 9.09 1016.4 2642.9 3129.1 9.09 0.78 
10 10.15 945.4 2191.2 2711.0 10.68 1047.4 2427.8 3003.7 10.68 0.07 
11 3.08 596.7 3148.8 3441.9 3.17 632.3 3336.6 3647.2 3.17 4.49 
12 4.29 682.6 2904.4 3262.9 4.44 730.6 3108.8 3492.5 4.44 2.87 
13 13.45 960.7 1723.8 2434.5 14.23 1075.1 1929.1 2724.3 14.24 3.94 
14 9.79 883.4 2069.2 2684.1 10.28 974.4 2282.4 2960.7 10.29 3.83 
15 19.45 1069.9 1377.5 2150.3 20.79 1221.8 1573.0 2455.6 20.80 0.90 
Sum A= 13061.7 B=36172.7 -C = 43026.6 A= 14436.9 B=39535.0 -C = 47095.6 CSS1b = 158.79 
b = 0.8982 b = 0.8973 
TABLE X2.8 First Iteration of Class 2 Model Fitting 
i W; for b =1 WX; W;Y; wPXxy; Sx WP(X£syf = Y?sxf) wx; Sy 
1 6.67 165.42 152.55 24.7 53.9 94.0 
2 7.05 181.71 154.37 30.1 124.2 99.5 
3 6.35 163.67 148.70 34.8 100.0 131.5 
4 7.66 172.58 162.17 9.1 8.5 32.2 
5 4.90 144.58 132.73 69.8 248.0 311.0 
6 19.56 301.23 230.26 258.9 1.5 414.1 
7 11.37 225.93 188.74 4.3 —5.6 10.6 
8 2.37 101.33 95.39 148.1 861.1 996.4 
9 8.66 191.99 169.66 4.1 12.6 12.6 
10 10.15 203.81 182.02 0.6 1.7 1.7 
11 3.08 115.69 107.53 127.6 628.8 736.0 
12 4.29 135.47 125.06 87.4 338.2 417.9 
13 13.45 221.58 206.09 47.9 130.5 121.3 
14 9.79 193.91 180.11 -0.1 4.8 -0.2 
15 19.45 261.90 239.18 280.3 427.3 563.3 
Sum 134.80 2780.81 2474.55 A = 11274 B = 2935.4 —C = 3941.6 
Avg X = 20.63 Y = 18.36 
b = 0.9765 
Y = 18.36 
TABLE X2.9 Second Iteration of Class 2 Model Fitting 
2 a 2 
i w; for b = 0.9765 wX; W;Y; WPXY Sx i ME WPXy;Sy£. pris 2 w; (y, — bx)? 
1 6.74 165.42 154.03 25.4 55.5 96.6 6.73 2.96 
2 7.12 183.69 156.05 31.0 127.7 102.5 7.12 16.02 
3 6.41 165.27 150.16 35.7 102.6 135.0 6.41 0.00 
4 7.74 174.36 163.84 9.4 8.9 33.4 7.74 6.93 
5 4.94 145.82 133.87 71.3 253.3 317.6 4.94 0.01 
6 19.92 306.70 234.44 266.8 1:1 426.7 19.92 44.10 
7 11.52 229.00 191.30 4.3 -5.8 10.5 11.52 12.17 
8 2.39 101.95 95.96 150.1 872.9 1010.0 2.39 0.17 
9 8.76 194.20 171.61 4.3 13.1 13.2 8.76 0.70 
10 10.27 206.28 184.23 0.6 1.6 1.6 10.27 0.10 
11 3.10 116.49 108.27 129.6 638.8 747.8 3.10 0.00 
12 4.33 136.57 126.07 89.1 344.7 426.1 4.33 0.04 
13 13.63 224.52 208.84 48.6 133.1 123.3 13.63 14.00 
14 9.90 196.16 182.20 -0.1 4.7 -0.3 9.90 6.87 
15 19.76 266.00 242.92 287.6 438.9 578.0 19.75 16.95 
Sum 136.52 2812.43 2503.79 A = 1153.6 B - 2991.1 -C = 4021.8 CSS, = 121.03 
Avg 20.60 18.34 
b = 0.9767 


X2.5.4.4 Now |b — bol = 0.0002 < .001 b, and iteration may 
stop. The final computation of {wi} and CSS, = 121.03, are 
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shown in the last two columns of Table X2.9. a = 18.34 — 


0.9767 x 20.60 = —1.78, from Eq 26. 
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TABLE X2.10 Residuals 


Original Sorted i ® Term in 
Rank Sequence No. Residual Vj Di Eq X2-1 
1 6 -6.05 -2.01 0.022 -0.45 
2 2 —4.30 -1.43 0.077 —1.01 
3 7 -3.41 -1.13 0.130 -1.25 
4 9 —0.94 -0.30 0.383 -1.21 
5 11 —0.69 -0.21 0.416 -1.24 
6 8 —0.38 —0.11 0.457 -1.17 
r4 5 -0.35 -0.10 0.460 -1.23 
8 12 -0.34 -0.10 0.462 -1.39 
9 3 -0.25 -0.06 0.475 -1054 
10 10 0.36 0.14 0.555 -1.52 
11 1 1.47 0.51 0.696 -1.26 
12 4 2.49 0.86 0.804 -1.08 
13 14 2.66 0.91 0.820 -0.56 
14 13 4.07 1.39 0.917 -0.30 
15 15 4.82 1.64 0.949 -0.14 


X2.6 Conduct Tests to Select the Most Parsimonious Bias 
Correction Class Needed 
X2.6.1 From Eq 27 compute: 


(CSS, — CSS,)/2 (812.46 — 121.03)/2 


~ CSSS—2) 121.03/13 


= 37.13 
(X2.15) 


X2.6.2 The 95" percentile of the F distribution, with 2 and 
13 degrees of freedom, is 3.81. As the computed F is larger 
than 3.81, we conclude that a bias correction (of class yet to be 
determined) will significantly improve the expected agreement 
between the two methods. 


X2.6.3 As CSS,, is smaller than CSS,,, the f-ratios of 
equation Eq 28 are: 


CSS, CSS, _ [81246 12386... (X216 
WV cssy(s - 2) 121.0313 IP AUS 


and 


CSS,, — CSS, 123.86 — 121.03 
pe 2)- J muon ee (32:17) 
X2.6.4 The 97.5" percentile of Student's t distribution, with 
13 degrees of freedom, is 2.16. As t, is smaller than 2.16, we 
compare f, to the same percentile, as discussed in 6.5.3.3. t, 
exceeds 2.16, so we conclude that a constant bias correction is 
preferred to a linear (proportional + constant) bias correction. 
The preferred bias correction is to subtract (since a has a 
negative sign) 2.26 volume 46 aromatics from any Test Method 
D5580 result, in order to predict a Test Method D5769 result on 
the same material. Note that the predicted Test Method D5769 
result should be within the scope of D5769 in order for it to be 
meaningful. 


X2.7 Test for Existence of Sample-Specific Biases 


X2.7.1 The CSS of the selected bias correction is 123.86, 
with S-1 = 14 degrees of freedom. The 95" percentile value of 
the chi-square distribution is 23.68. As the CSS is larger, we 
conclude that there are likely sample-specific biases between 
the methods. 


X2.8 Examine Residuals to Assess Reasonableness of 
Random Effect Assumption 


X2.8.1 The (standardized) residuals e Vw Y,-). are 
shown in Table X2.10. For example, the residual for the first 
sample (first in Tables X2.1-X2.9) is W 6.67 (22.87— (24.56 
—226)) 2 1.47, which is found in the eleventh row. (The table 
has been sorted in order of increasing e ;.) (w;) are taken from 
Table X2.6, which is appropriate for the selected bias correc- 
tion. 


X2.8.2 Anderson-Darling Statistic: 

X2.8.2.1 From Eq A1.4 of Practice D6299, the residuals, 
(5). are again normalized. To avoid a conflict in notation, what 
are called w; in that practice are called v; = (e; — £)/s, here and 
in Table X2.10, where & = -.06 is the mean of the {g;}, and s, 
=2.97 is the standard deviation. The {p,} are from tables of the 
standard normal distribution. From Eq. A1.6 and A1.7 of 
Practice D6299, 


P Yi = ine) In(1 = Pasic] n = 0.361 
(X2.18) 
. 0.75 2.25 
A? -a is : + P | — 0.382 (X2.19) 


X2.8.2.2 As A” (0.382) is less than the .05 level critical 
value (0.752) for the Anderson Darling statistic, the distribu- 
tion of the residuals cannot be distinguished from the normal 
distribution. 


X2.8.3 Between Methods Reproducibility: 


X2.8.3.1 Estimate the between methods reproducibility 
(Ryy) as follows: 


bRDORL 2.1.96 (CSS — S+k)S 
Ry = 3E 14 2 
xy 58 b?R2,+R2, 
E 
(X2.20) 


y b?R2,+R2,  0.2792724.56--0.1292?22.87" 
b252 +53 0.177 + 0.345? 


m D6708 - 19a 


, 0.27927 13.46 + 0.1292? 12.37 X2.8.3.2 Because of the sample-specific biases (which 
l 0.131?+0.185° could be due to the need for further standardization in one of 
the methods as noted earlier), this is 36 % larger than the root 
mean squares of the individual reproducibilities. 


=70.80+ ... + 69.57 = 1059.57 
Ry 


. J(92T9yx _ 0.1292? 2 X 1.96? x (123.86 — 15+1)15 
2 2 (15 — 1)1059.57 


0.27922X 0.1292Y?| ||, — ——— 
- 3 —*— 5 — ) V 1.85356 


= 1.36V/0.03898X + 0.008346Y? = \/0.07225X + 0.01547Y? 


SUMMARY OF CHANGES 


Subcommittee D02.94 has identified the location of selected changes to this standard since the last issue 
(D6708 — 19) that may impact the use of this standard. (Approved Nov. 1, 2019.) 


(1) Revised subsections 1.1, 1.7, 7.1, and 7.2. (2) Added subsection 1.8. 


Subcommittee D02.94 has identified the location of selected changes to this standard since the last issue 
(D6708 — 18) that may impact the use of this standard. (Approved May 1, 2019.) 


(1) Revised 1.5, 3.1.1, 3.1.3, 5.3, 6.8, and 6.8.1. (2) Revised Note 3. 
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